On-line web traffic sampling

ABSTRACT

A method for tracking and reporting traffic activity on a web site first comprises storing a web page on a first server coupled to a network. When the web page is requested from a visitor computer, the visitor computer is selected (or not selected) for inclusion within a sample group, where the sample group is only a subset of the total traffic to the web site. A selection indicator is stored on the visitor computer associated with this selection. Data mining code within the web site is operated or not operated depending upon the value of the selection indicator. The data mining code is operated, and the activity of the visitor computer on the web site consequently tracked only if the visitor computer is selected as a member within the sample group, otherwise the data mining code is not operated and no traffic activity is generated.

BACKGROUND OF THE INVENTION

[0001] The present application relates to compiling and reporting dataassociated with activity on a network server and more particularly tocompiling only a subset of the potential data to reduce data analysisand reporting hardware needs.

[0002] Programs for analyzing traffic on a network server, such as aworldwide web server, are known in the art. One such prior art programis described in U.S. patent application Ser. No. 09/240,208, filed Jan.29, 1999, for a Method and Apparatus for Evaluating Visitors to a WebServer, which is incorporated herein by reference for all purposes.NetIQ Corporation owns this application and also owns the presentprovisional application. In these prior art systems, the programtypically runs on the web server that is being monitored. Data iscompiled, and reports are generated on demand—or are delivered from timeto time via email—to display information about web server activity, suchas the most popular page by number of visits, peak hours of websiteactivity, most popular entry page, etc.

[0003] Analyzing activity on a worldwide web server from a differentlocation on a global computer network (“Internet”) is also known in theart. To do so, a provider of remote web-site activity analysis (“serviceprovider”) generates JavaScript code that is distributed to eachsubscriber to the service. The subscriber copies the code into eachweb-site page that is to be monitored.

[0004] When a visitor to the subscriber's web site loads one of theweb-site pages into his or her computer, the JavaScript code collectsinformation, including time of day, visitor domain, page visited, etc.The code then calls a server operated by the service provider—alsolocated on the Internet—and transmits the collected information theretoas a URL parameter value. Information is also transmitted in a knownmanner via a cookie.

[0005] Each subscriber has a password to access a page on the serviceprovider's server. This page includes a set of tables that summarize, inreal time, activity on the customer's web site.

[0006] The above-described arrangement for monitoring web serveractivity by a service provider over the Internet is generally known inthe art. Examples of the information analyzed includes technical data,such as most popular pages, referring URLs, total number of visitors,returning visitors, etc. The basic mechanism of such services is thateach tracked web-site page contains some JavaScript in it that requestsa 1×1 image from the service provider's server. Other information issent along with that request, including a cookie that uniquelyidentifies the visitor. Upon receipt of the request, applicants'WebTrendsLive service records the hit and stages it for full accounting.This is a proven method for tracking web site usage.

[0007] While this mechanism works, it requires increased resources (e.g.bandwidth and processing) for service providers as traffic to itincreases.

[0008] Accordingly, the need still remains for a way to reduce theresources necessary to track and report web-page traffic whilemaintaining accuracy in the statistics obtained.

SUMMARY OF THE INVENTION

[0009] For very busy sites, it makes sense to sample the traffic ratherthan record each and every hit. The result is a lower cost ofoperations.

[0010] The foregoing and other objects, features and advantages of theinvention will become more readily apparent from the following detaileddescription of a preferred embodiment of the invention that proceedswith reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011]FIG. 1 is a schematic view of a portion of the Internet on whichthe invention is operated.

[0012]FIG. 2 is a flow diagram illustrating a method for sampling webpage traffic according to a preferred embodiment of the invention.

[0013]FIG. 3 is a flow diagram illustrating a method for sampling webpage traffic according to an alternate embodiment of the invention.

[0014]FIG. 4 is a flow diagram illustrating a method for sampling webpage traffic according to a second alternate embodiment of theinvention.

[0015] APPENDIX lists a preferred implementation of the code operable tocreate a sample population per the method shown in FIG. 2.

DETAILED DESCRIPTION

[0016] Turning now to FIG. 1, indicated generally at 10 is a highlyschematic view of a portion of the Internet. FIG. 1 depicts a systemimplementing the present invention. Included thereon is a worldwide webserver 12. Server 12, in the present example, is operated by a businessthat sells products via server 12, although the same implementation canbe made for sales of services via the server. The server includes aplurality of pages that describe the business and the products that areoffered for sale that a site visitor can download to his or hercomputer, like computer 14, using a conventional browser program runningon the computer.

[0017] As mentioned above, it would be advantageous to the seller tohave an understanding about how customers and potential customers useserver 12. As also mentioned above, it is known to obtain thisunderstanding by analyzing web-server log files at the server thatsupports the selling web site. It is also known in the art to collectdata over the Internet and generate activity reports at a remote server.

[0018] When the owner of server 12 first decides to utilize a remoteservice provider to generate such reports, he or she uses a computer 16,which is equipped with a web browser, to visit a web server 18 operatedby the service provider. On server 18, the subscriber opens an accountand creates a format for real-time reporting of activity on server 12.

[0019] To generate such reporting, server 18 provides computer 16 with asmall piece of code, typically JavaScript code (data mining code). Thesubscriber simply copies and pastes this code onto each web pagemaintained on server 12 for which monitoring is desired. When a visitorfrom computer 14 (client node) loads one of the web pages having theembedded code therein, the code passes predetermined information fromcomputer 14 to a server 20—also operated by the service provider—via theInternet. This information includes, e.g., the page viewed, the time ofthe view, the length of stay on the page, the visitor's identification,etc. Server 20 in turn transmits this information to an analysis server22, which is also maintained by the service provider. This serveranalyzes the raw data collected on server 20 and passes it to a databaseserver 24 that the service provider also operates.

[0020] When the subscriber would like to see and print real-timestatistics, the subscriber uses computer 16 to access server 18, whichin turn is connected to database server 24 at the service provider'slocation. The owner can then see and print reports, like those availablethrough the webtrendslive.com reporting service operated by the assigneeof this application, that provide real-time information about theactivity at server 12.

[0021] The data mining code embedded within the web page script operatesto gather data about the visitor's computer. Also included within theweb page script is a request for a 1×1 pixel image whose source isserver 20. The 1×1 pixel image is too small to be viewed on thevisitor's computer screen and is simply a method for sending informationto server 20, which logs for processing by server 22, all web trafficinformation.

[0022] The data mined from the visitor computer by the data mining codeis attached as a code string to the end of the image request sent to theserver 20. By setting the source of the image to a variable built by thescript (e.g. www.webtrendslive.com/button3.asp? id39786c45629t120145),all the gathered information can be passed to the web server doing thelogging. In this case, for instance, the variable script“id39786c45629t120145” is sent to the webtrendslive.com web site and isinterpreted by a decoder program built into the data analysis server tomean that a user with ID#39786, loaded client web site #45629 in 4.5seconds and spent 1:20 minutes there before moving to another web site.

[0023] Another method for tracking visitors to a web site is through theuse of objects called “cookies.” A cookie is a piece of text that a webserver can store on a user's hard disk. Cookies allow a web site tostore information on a user's machine and later retrieve it. The piecesof information are stored as “name-value pairs” comprised of, forinstance, a variable name (e.g. UserID) and a value (e.g.A9A3BECE0563982D) associated with that variable name.

[0024] Taking the web browser Microsoft Internet Explorer as an example,cookies are typically stored on a machine running Window 9x in adirectory called c:\windows\cookies. The directory may list a vastnumber of name-value pairs, each associated with a particular domainfrom which they originated, representing all of the web sites that hasplaced a cookie on that particular computer. An example of a cookie fileis shown below: UserID A9A3BECE0563982D www.netiq.com/

[0025] The cookie above is typical of the type stored on a visitor'scomputer (hereinafter the client node) when visiting the web sitelocated at the domain netiq.com. The name of the name-value pair isuserID, and the value is A9A3BECE0563982D. Both the name and value ofthe pair are generated according to an algorithm programmed in thecookie server associated with the domain web site. The first time theclient node browses the netiq.com web site, software on that web siteassigns a unique ID number for each visitor and instructs the browser onthe client node to store the name-value pair as a cookie in a designatedfolder where it can be retrieved later. The same name-value pair data isstored on the netiq.com cookie server (such as customer server 12) alongwith other information so that the visitor can be identified later.

[0026] Cookies operate according to an industry standard called “CookieRFC” (request for comment).

[0027] A more complicated example of a cookie is shown below inreference to the eCommerce web site amazon.com. Visits to the amazon.comweb site result in the storage of a more comprehensive set ofinformation on the client node visiting the web site. The resultingcookie from such a visit is comprised of the following “crumbs”:

[0028] Session-id-time 954242000 amazon.com/

[0029] Session-id 002-4135256-7625846 amazon.com/

[0030] x-main eKQIfwnxuF7qtmX52x6VWAXh@ih6Uo5H amazon.com/

[0031] ubid-main 077-9263437-9645324 amazon.com/

[0032] Each of these portions of the cookie, or “crumbs”, is associatedwith the amazon.com domain. Based on these crumbs, it appears thatamazon.com stores a main user ID, an ID for each session, and the timethe session started on the visitor computer (as well as an x-main value,which could be anything). While the vast majority of sites store justone piece of information—a user ID—on a visitor computer, there isreally no limit to the amount of information such sites can store on thevisitor computer in name-value pairs.

[0033] A name-value pair is simply a named piece of data. It is not aprogram, and it cannot “do” anything. A web site can retrieve only theinformation that it has placed on the client node computer. It cannotretrieve information from other cookie files, or any other informationfrom your machine.

[0034] The data moves in the following manner. If one were to type theURL of a web site into a computer browser, the browser sends a requestto the web site for the page. For example, if one were to type the URLhttp://www.amazon.com into the browser, the browser will contactAmazon's server and request its home page. When the browser does this,it will look on the requesting machine for a cookie file that Amazon hasset. If it finds an Amazon cookie file, the browser will send all of thename-value pairs in the file to Amazon's server along with the URL. Ifit finds no cookie file, it will send no cookie data. Amazon's webserver receives the cookie data and the request for a page. Ifname-value pairs are received, Amazon can use them.

[0035] If no name-value pairs are received, Amazon knows that thevisitor operating that computer has not visited before. The servercreates a new ID for that visitor in Amazon's database and then sendsname-value pairs to the computer in the header for the web page itsends. The computer stores the name-value pairs on its hard disk driveaccording to the Cookie RFC protocol.

[0036] The web server can change name-value pairs or add new pairswhenever you visit the site and request a page.

[0037] There are other pieces of information that the server can sendwith the name-value pair. One of these is an expiration date. Another isa path so that the site can associate different cookie values withdifferent parts of the site.

[0038] Cookies evolved because they solve a big problem for the peoplewho implement web sites. In the broadest sense, a cookie allows a siteto store state information on a visitor's computer. This informationlets a web site remember what state the browser is in. An ID is onesimple piece of state information—if an ID exists on the visitingcomputer, the site knows that the user has visited before. The state is,“Your browser has visited the site at least one time,” and the siteknows the user ID from that visit.

[0039] Web sites conventionally use cookies in many different ways. Forinstance, sites can accurately determine how many readers actually visitthe site, which are new as opposed to repeat visitors, and how ofteneach visitor has visited the site. It turns out that because of proxyservers, caching, concentrators and so on, the only way for a site toaccurately count visitors is to set a cookie with a unique ID for eachvisitor. The way the site does this is by using a database. The firsttime a visitor arrives, the site creates a new ID in the database andsends the ID as a cookie. The next time the user comes back, the sitecan increment a counter associated with that ID in the database and knowhow many times that visitor returns.

[0040] Sites can also store user preferences so that the site can lookdifferent for each visitor (often referred to as customization). Forexample, if one were to visit msn.com, it offers the visitor the abilityto change content/layout/color. It also allows one to enter a zip codeand get customized weather information. When the zip code is entered,the following name-value pair is an example of what might be added toMSN's cookie file:

[0041] WEAT CC=NC %5FRaleigh %2DDurham®ION=www.msn.com/

[0042] It is apparent from this name-value pair that the visitor is fromRaleigh, N.C. Most sites seem to store preferences like this in thesite's database and store nothing but an ID as a cookie, but storing theactual values in name-value pairs is another way to do it.

[0043] ECommerce Sites can implement things like shopping carts and“quick checkout” options. The cookie contains an ID and lets the sitekeep track of a visitor as the visitor adds different things to his orher “shopping cart.” Each item added is stored in the site's databasealong with the visitor's ID value. When the visitor checks out, the siteknows what is in his or her cart by retrieving all of the selectionsfrom the database associated with that user or session ID. It would beimpossible to implement a convenient shopping mechanism without cookiesor something like it.

[0044] In all of these examples, note that what the database is able tostore is things the visitor has selected from the site, pages viewedfrom the site, information given to the site in online forms, etc. Allof the information is stored in the site's database, and a cookiecontaining your unique ID is all that is stored on the client node 14(FIG. 1) in most cases. Both the image request method, and the cookiemethod, result in each and every web-page visit be reported—a resultthat can overwhelm the data collection server 20.

[0045] To address this drawback, the present invention operates bysampling only a portion of the data. In the preferred embodiment, thedata mining code operates to return traffic data to the data collectionserver 20 for only a random subset of the computers and/or visitorsvisiting a web site. This is herein referred to as “sampling.”

[0046] The theory of sampling is that by accounting for a random subsetof the whole population, the actual numbers can be extrapolated withhigh confidence. For sampling web traffic, it is preferable to samplevisitors, not hits or visits. If visits or hits are sampled,inaccuracies can arise. For example, if hits are sampled, then the pathsthrough sites are distorted in a way that cannot be reconstructed. Also,some visits would be completely unaccounted for (those who had all oftheir hits not chosen), whereas other visits are still counted asvisits, though perhaps with fewer hits, and there is no easy way tocorrect for this. The same kinds of problems arise with sampling visits.Therefore, it is preferred to sample visitors, though it is not centralto the idea of this invention.

[0047] To reap the most benefit from sampling, it is preferred that thesampling be performed on the client machine. This is important becauseit does the best job of reducing the traffic from the client 14 to theservice provider 20 as early as possible, thus minimizing the resourcesrequired to handle that traffic.

[0048]FIG. 2 is a flow diagram illustrating the preferred method forsampling data on a visitor computer 14. The process begins in block 30where a web page request has been initiated at the client node 14 andthe web-page address (URL) and cookie (if any) associated with thataddress are sent out through the world-wide network. The request isrouted through the world-wide network 10 to the customer web site 12,and the requested web page and attached data mining code is forwardedfrom the site 12 to the client node 14 responsive to the request.

[0049] The method then proceeds to query block 32 in which it isdetermined whether cookies can be stored on the client node 14.Conventional browsers offer security features that allow users to blockcookies from being stored on their computer. If this feature is turnedon, then no cookies can be accepted and the method proceeds to block 34where the process ends. Because users who elect to block cookies form adefinite subset of the computer user population, it is preferred thatsuch users be excluded from the data sampling operating since,otherwise, such a population could skew the normalized trafficstatistics.

[0050] An alternate path flow is to not determine whether cookies areenabled or not. Rather, query block 32 would not exist and the methodwould simply proceed to query block 36 and thence to blocks 40 and 42.Users who deny the cookie will go through the selection routine everyvisit—but since the cookie can never be written to or stored on thevisitor computer, the code will never execute.

[0051] Included with the web page is the data mining code that includescookie processing script. A portion of the script, used to implement themethod shown in FIG. 2, is listed in APPENDIX. Line number 33 ofAPPENDIX includes the command that selectively executes the datatracking function of the remainder of the data mining code. In the codeshown, the data mining code will execute if the “track” value maintainedwithin a cookie stored on the visitor's computer is “true”, otherwisethe data mining code will not operate on the client node 14. As statedabove, the data mining code typically includes an image request functionthat attaches relevant data to the request, which is forwarded to aservice provider data analysis server 20. The returned image is only 1×1pixel in size and is thus too small to see. Other known methods createan image object that is never actually written to the screen and thus isnot size-limited.

[0052] If it is determined in query block 32 that cookies can be storedon the visitor's computer, then the method proceeds to query block 36.For new visitors to a particular site, no cookie associated with thesite being tracked will typically exist on the client node 14. Thecookie must then be created in block 42, as by first operating theJavaScript code contained in lines 1-7 in APPENDIX to set a random valueand then, in line 17 compare the random value to a sample interval setby the web site owner to select the visitor in block 40. The intervalnumber can be set to any whole number, but is most preferably set to 10(or 100) so that one-in-ten (or one-in-a-hundred) of the visitors to thesite cause the cookie “track” value to be set to “true” in block 42 andthus operate the data mining code in block 44.

[0053] Once the tracking cookie is set (or already exists) within thevisitor profile on client node 14, then that cookie is read in block 46and queried in block 48 to determine whether the client node is to betracked (as by operating block 44) or not. If the cookie “track” valueis set to “false”, as in code line 24 in APPENDIX, then the methodproceeds to block 50 where the tracking process ends. The if-elsestatement shown in lines 27-30 of the APPENDIX script results in a“track=true” setting for all circumstances where “track” does not equaltrue. One such circumstance where “track=false” is the browser onvisitor computer 14 is set to block cookies so that no cookies can beset and stored on that computer.

[0054] If the cookie “track” value is instead set to “true”, then thedata mining code is operated in block 44 to gather the visitor trafficdata (e.g. the page viewed, the time of the view, the length of stay onthe page, the visitor's identification, etc.), which is then transmittedto the data collection server 20 of the service provider utilizing thepresent invention.

[0055] A sample name-value pair cookie that is set according to thepresent invention may be stored on the visitor computer, within a userprofile associated with the visitor operating the client node 14, asfollows:

[0056] track_true_localhost/battlebots/_(—)0_(—)1452168960_(—)29502115_(—)2223297120_(—)29465905_*

[0057] where the cookie lists the tracking setting as set to “true”, thetracking domain (battlebots), and the associated tracking values forthat domain.

[0058] Because the data is sampled, the data actually received by datacollection server 20 and passed to data analysis server 22 must benormalized to account for the fact that only a fraction of the actualvisitors to a web site are being tracked. If, in line 17 of the sampleJavaScript programming code shown in APPENDIX implementing theinvention, the value of variable wtl_sampleinterval is set to ‘10’, thenonly approximately one in ten visitors are counted. Consequently, thedata of actual visitors to the site must be multiplied by a factor often so that a very close approximation of the actual visitors to thesite is reflected in the reports generated by database/report server 24.If variable wtl_sample interval is set to ‘100’, then only approximatelyone in a hundred are counted and the normalizing multiplier is 100×. Itis understood, therefore, that the product of the sample interval andthe normalizing multiplier equals one.

[0059] In summary, the process for Client-Side Sampling is implementedas follows according to a first embodiment of the invention:

[0060] 1. The JavaScript executing on the client machine determineswhether or not to execute the tracking functionality (based on somerandom criteria).

[0061] 2. The JavaScript sets a cookie for the site on the client'smachine, indicating whether or not the visitor is included in the samplepopulation.

[0062] 3. Each time the JavaScript is invoked, it checks the samplingcookie to see if it should report the client's traffic to the serviceprovider.

[0063] Although the cookie is described above as being set and stored onthe client machine, those skilled in the art would recognize that theinvention is not so limited. Many modern networked computers operateunder the principal of “profiles” where each user within a network hastheir own settings stored on the network. Logging on to any computer onthe network causes the profile (typically stored on a central serverwithin the LAN) to be uploaded to that computer. Cookie values for theperson logging in are included within that profile. Thus, the cookie“track” value (also referred to herein as the selection indicator) iscapable of following the visitor no matter which workstation the visitorhappens to be using at the time. Similarly, a single client nodecomputer 14 may host several visitors where some are selected forinclusion within the tracking subset and others are not. Accordingly,the term “visitor computer” or “client computer” is not intended to belimited to any single machine but rather could encompass selection ofthe machine itself and/or the visitor currently operating the machine.

[0064] Pure Client-Side Sampling is not required, however. In a secondembodiment of the invention, the service provider's servers (or thosehosting the web site) are minimally involved to set the sample “track”value of the cookie for new visitors:

[0065] i. The JavaScript looks for a local cookie. If no such cookieexists, the JavaScript makes a request to the service, e.g. an imagerequest to data collection server 20.

[0066] ii. The service identifies new users—and randomly selects thevisitor for tracking. The result of that selection is passed back to theJavaScript on the client via the image size and/or color. For example, a1×1 image might indicate that the visitor is to be tracked, while a 1×2image indicates that the visitor is not to be tracked. Similarly, ablack pixel image might represent “track=true” while a white pixelrepresents “track=false”. The cookie value and the return imageinterpretation are considered examples of “selection indicators.”

[0067] iii. Based on the returning image size/color—the JavaScript thensets a local cookie containing the tracking selection value (e.g. “true”or “false”).

[0068] A flow diagram implementing the second method for sampling isshown in FIG. 3. The process proceeds similarly to that shown in FIG. 2for pure client-side sampling, whereby the process starts in block 30with a request for a web page. If in query block 32 it is determinedthat the computer browser is set so that no cookies are accepted, thenthe process ends in block 34 and the visitor is not tracked. If it isdetermined that cookies are accepted in query block 32, and that acookie exists in query block 36, then the cookie “track” value is readand, if “track=true” then query block 48 proceeds to block 44 in whichthe JavaScript data mining code embedded within the returned web pagegathers data about the visitor and transmits the information to the datagathering server 20. If “track=false” then query block 48 proceeds toblock 50 in which the JavaScript data mining code embedded within thereturned web page is ignored and no traffic information is returned tothe data gathering server 20.

[0069] If instead in query block 36 it is determined that a cookie doesnot exist on the machine (e.g. that the visitor is new to the site, thecookie has expired or cookies cannot be stored), then the processproceeds to block 52 which implements a request to the service providerfor a tracking select cookie. Code stored within the server operates ina similar manner to the script shown in APPENDIX to randomly selectvisitors in block 54, as by generating a random number, comparing therandomly generated number to a sample ratio, and returning a cookie tothe visitor computer 14 indicating whether the visitor is selected(“track=true”) or not (“track=false”). Subsequent visits by the visitorto the web site will result in the newly stored cookie being read, inquery block 48, and the data mining code operated or not operatedaccording to the cookie “track” setting.

[0070] Again, the process flow can be configured so that blocks 32 and34 do not exist. Instead, rather the cookie will never be capable ofbeing stored and thus does not “exist” for the purposes of query block36—thus, the process will implement the selection routine in blocks 52,54 and 56 every time for that visitor computer, the cookie set but neverstored, and thus no “true” value read in query block 48 to cause thedata mining code to execute in block 44.

[0071] One can imagine sampling the visits or the visitors. In visitsampling, the cookie expiration is set to a relatively short term, suchas 30 minutes, to reflect that a user typically spends less than thatperiod of time at the site during any one visit. If the cookie hasexpired, it is then assumed that the user is initiating a subsequentvisit to the web site independent of the earlier one. By storing a localcookie and updating it on each request, visit sampling can be done bysetting the cookie expiration. If there is a lapse of 30 minutes, thecookie expires and a new session is created.

[0072] Under visitor sampling which is the preferred method for sampledtracking, the cookie expiration is set to a relatively long term, suchas 3 months, to track a single user over multiple site visits. Bystoring a local cookie—visitor sampling can be done by setting apermanent, or long term, cookie. The visitor then retains that settingfor future sessions.

[0073] An example of a JavaScript subroutine that sets the cookieexpiration data at 180 days is shown by the following code:

[0074] var exp=new Date( )

[0075] var newexp=exp.getTime( )+(86400000*180)

[0076] exp.setTime(newexp)

[0077] document.cookie=cval+“;expires=”+exp.toGMTString( );

[0078] In both the methods shown in FIGS. 2 and 3, the data miningJavaScript code is transmitted with the requested web page whether thevisitor is to be tracked or not. Although the data mining code istypically small relative to the code implementing the web page, thiscreates inefficiencies in that data is being transmitted that will notbe used. Preventing that code from being transmitted in the first placewould then reduce the bandwidth requirements for serving the web site.

[0079]FIG. 4 illustrates a method for implementing a feature whereby thedata mining code is conditionally included in the data sent to thevisitor computer 14 for operating on that computer's web browser. Theprocess starts in block 60 with a request by the visitor computer 14 ofa web page. The request results in, among other data, a URL identifyingthe web page and a cookie associated with the URL being sent throughoutthe worldwide network 10. The request is routed through the worldwidenetwork and received at the customer site web server 12. Query block 62operating within web server 12 determines whether a cookie accompaniedthe request. If not, then the process proceeds to block 64 in which acookie is generated, block 66 where the cookie “track” value is set totrue or false according to some algorithm such as those described above,and block 68 where the cookie (including “track” value) is sent back tothe visitor computer for storage.

[0080] If the newly created cookie, or a preexisting one includes a“track” value that is set to true, then the requested web page isreturned to the visitor computer with the data mining code appended inblock 72. The data mining code, when operated by the visitor computer'sbrowser, then gathers the data in block 74 and transmits it to thetracking server 20 as by known means. If the cookie “track” value is notset to “true” (e.g. it is instead set to false), then the processproceeds to block 76 where only the requested web page is returned fromcustomer site server 12 to visitor computer 14—the data mining code isnot appended.

[0081] Code for the Server Side Include may be configured as follows:<%if not reguest.cookies(“track”) = “false” then%> <!- - #IncludeVirtual=“/includes/trackingcode.inc”- -> <%end if%>

[0082] which operates so that an ISS webserver would include the datamining code within the source web page of an Active Server Page if acookie has not been successfully set to “track=false.”

[0083] Having described and illustrated the principles of the inventionin a preferred embodiment thereof, it should be apparent that theinvention can be modified in arrangement and detail without departingfrom such principles. I claim all modifications and variation comingwithin the spirit and scope of the following claims.

I claim:
 1. A method for tracking and reporting traffic activity on aweb site, comprising: storing a web page on a first server coupled to anetwork; requesting the web page from a visitor computer; selecting thevisitor computer and/or visitor for inclusion or non-inclusion within asample group, said sample group being a subset of total traffic to theweb site; storing on the visitor computer a selection indicatorassociating with the inclusion or non-inclusion; and tracking trafficactivity to the web site from the visitor computer and/or visitor onlyif the visitor computer and/or visitor is a member within the samplegroup, otherwise ignoring the traffic activity from the visitor computerand/or visitor.
 2. The method of claim 1, wherein the web page includesweb page code, data mining code, and cookie processing script, themethod further including the step of operating the cookie processingscript on the visitor computer to generate the selection indicator. 3.The method of claim 1, the method further including: storing cookieprocessing script on a second server; receiving a request from thevisitor computer at the second server; operating the cookie processingscript responsive to the request to generate the selection indicator;and returning the selection indicator to the visitor computer forstorage.
 4. The method of claim 1, wherein the first server includescookie processing script, the method further including the steps of:operating the cookie processing script responsive to the requesting stepto generate the selection indicator; and returning the selectionindicator to the visitor computer for storage.
 5. The method of claim 4,further including the steps of: embedding an image request within theweb page; causing the image request to be sent to a second server;returning an image responsive to the image request; and setting theselection indicator responsive to the image.
 6. The method of claim 1,further including the steps of: receiving an image at the visitorcomputer responsive to the web page request; and setting the selectionindicator to “true” at the visitor computer responsive to a first typeof received image, otherwise setting the selection indicator to “false”responsive to a second type of image, wherein the image type is oneselected from the group consisting of size or color.
 7. The method ofclaim 6, wherein the selection indicator is set to “true” at the visitorcomputer responsive to the received image being 1×1 pixel in size, andwherein the selection indicator is set to “false” responsive to thereceived image being 1×2 pixels in size.
 8. The method of claim 6,wherein the selection indicator is set to “true” responsive to thereceived image having a first color, and wherein the selection indicatoris set to “false” responsive to received image having a second color. 9.The method of claim 1, further including the steps of: setting anormalization multiplier in accordance with a ratio between the samplegroup and the total traffic on the web site; normalizing the trafficactivity by a normalization multiplier; and posting the report includingthe normalized traffic activity for viewing over the network.
 10. Amethod for tracking and reporting traffic activity on a web sitecomprising the steps of: storing a web page on a first server coupled toa wide area network, said web page having web page code and data miningcode including a cookie processing script; uploading the web page to avisitor computer responsive to a request over the wide area network fromthe visitor computer; operating the cookie processing script on the webbrowsing data to obtain at least one new cookie value, said new cookievalue including a visitor selection value; and storing the new cookie onthe visitor computer including the new cookie value.
 11. The method ofclaim 10, further including the step of operating the data mining codeon the visitor computer to obtain web browsing data responsive to thevisitor selection value.
 12. The method of claim 10, further includingthe step of operating the data mining code on the visitor computer toobtain web browsing data if the visitor selection value is set to“true”, otherwise not operating the data mining code on the visitorcomputer.
 13. The method of claim 10, further including the steps of:attaching the new cookie value to an image request associated with adesignated URL source; and sending the image request to the URL source.14. The method of claim 10, further including the steps of: operatingthe data mining code on the visitor computer to obtain web browsingdata; compiling the web browsing data into a web page traffic report;and posting the report for viewing over the wide area network.
 15. Amethod for tracking and reporting traffic activity on a web site storedon a web site server, comprising: receiving a request at the web siteserver for a web page from a visitor computer; determining whether therequest is classified within a sample group; and returning the web pageand associated data mining code for operating on the visitor computer ifthe request is within the sample group, otherwise returning just the webpage.
 16. The method of claim 15, further including: receiving with therequest a cookie including a selection indicator; and determiningwhether the request is classified within the sample group responsive toa value of the selection indicator.
 17. The method of claim 15, furtherincluding: generating a selection indicator responsive to the request;and returning the selection indicator to the visitor computer togetherwith the web page.
 18. The method of claim 17, further including storingthe selection indicator as a cookie within the visitor computer.